Skip to content

[codex] Return tool timeouts without draining blocked bodies#976

Open
mimeding wants to merge 7 commits intoosaurus-ai:mainfrom
mimeding:codex/tool-timeout-nonblocking-race
Open

[codex] Return tool timeouts without draining blocked bodies#976
mimeding wants to merge 7 commits intoosaurus-ai:mainfrom
mimeding:codex/tool-timeout-nonblocking-race

Conversation

@mimeding
Copy link
Copy Markdown
Contributor

@mimeding mimeding commented Apr 29, 2026

Summary

  • replace the task-group timeout race with a single-resume race state
  • return timeout envelopes immediately when the timeout wins instead of waiting for the tool body to observe cancellation
  • add a regression test for non-cooperative blocking tool bodies

Why

PR #927 currently fails test-core in slowToolReturnsTimeoutEnvelopeBeforeBudgetExpires. The underlying runtime also has a real production hazard: task-group scope exit drains cancelled children, so a blocked/non-cooperative tool can delay timeout reporting.

Verification

  • swift test --filter ToolRegistryTimeoutTests
  • swiftlint --strict Packages/OsaurusCore/Tools/ToolRegistry.swift Packages/OsaurusCore/Tests/Tool/ToolRegistryTimeoutTests.swift

@mimeding
Copy link
Copy Markdown
Contributor Author

Follow-up from the live debug pass: the code change itself still verifies locally.

Local checks:

  • swift test --filter ToolRegistryTimeoutTests passed: 3/3 tests.
  • swiftlint --strict Packages/OsaurusCore/Tools/ToolRegistry.swift Packages/OsaurusCore/Tests/Tool/ToolRegistryTimeoutTests.swift passed.

GitHub test-core is failing before tests execute with the same EventSource module-resolution class (CAsyncHTTPClient, CNIOLLHTTP, CNIOExtrasZlib, CNIOPosix, _NumericsShims). PR #975 is the shared CI/DerivedData fix and is now fully green, so I would land #975 first, then rerun this PR rather than treating the timeout patch as the failing root cause.

@mimeding
Copy link
Copy Markdown
Contributor Author

Status update after adding the clean-PR rule: this PR is now draft because its attached GitHub checks are not clean yet. It should move back to ready only after #975 lands/rebases into the branch and scripts/ci/check-pr-clean.sh osaurus-ai/osaurus 976 passes.

@mimeding
Copy link
Copy Markdown
Contributor Author

Reran #976 after local verification. Local status is clean:

  • git diff --check origin/main...HEAD
  • strict SwiftLint on ToolRegistry.swift and ToolRegistryTimeoutTests.swift
  • cd Packages/OsaurusCore && swift test --filter ToolRegistryTimeoutTests (3 tests passed)
  • cd Packages/OsaurusCore && swift build

I could not rerun the failed GitHub job directly because GitHub requires repository admin rights for gh run rerun, so I pushed an empty rerun commit (ef9afe3e). The fresh GitHub run still fails only in test-core, with the same EventSource dependency-resolution class:

  • CAsyncHTTPClient
  • CNIOLLHTTP
  • CNIOExtrasZlib
  • CNIOPosix
  • _NumericsShims

No timeout-code regression is visible. Keeping this PR draft until #975 lands or the EventSource CI class is otherwise fixed.

@mimeding mimeding force-pushed the codex/tool-timeout-nonblocking-race branch from ef9afe3 to 9f315a0 Compare April 30, 2026 10:48
@mimeding mimeding marked this pull request as ready for review April 30, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant